The Curse of Dimensionality & Building a Deeper CNN
Let's look at what happens to our classic SVM when we move away from tiny MNIST digits to harder, real-world color images.
The Problem with Flattening
To feed harder images (like 32x32 color images from CIFAR-10) into an SVM, we must flatten them into a 1D vector. Since color images have 3 RGB channels, the math becomes:
\( 3 \text{ channels} \times 32 \text{ height} \times 32 \text{ width} = \mathbf{3,072} \text{ features} \)
As the number of features grows, the math required for SVMs grows exponentially. The SVM becomes incredibly slow (CPU bound) and highly inaccurate because flattening destroys all spatial context of the color pixels.
Building a Deeper Brain: CIFARCNN
To handle these more complex, harder images, we need a deeper brain. We upgrade from our basic model to CIFARCNN. Here is how it scales up:
Upgrading the Architecture
- More Filters:
conv1has 32 filters,conv2has 64 filters. - Larger Fully Connected Layer: We increase the density to 512 neurons.
- Why? More layers equate to the ability to learn more complex patterns (like the texture of dog fur versus the smooth, reflective metal of a car).